Goto

Collaborating Authors

 multiple descent


Multiple Descent: Design Your Own Generalization Curve

Neural Information Processing Systems

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, the locations of those peaks can be explicitly controlled. Our results highlight the fact that both the classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.


Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions

Wei, Wenbo, Le, Nicholas Chong Jia, Lai, Choy Heng, Feng, Ling

arXiv.org Artificial Intelligence

In deep learning, understanding the training dynamics has become paramount for enhancing model performance, generalization, and robustness. The training of deep neural networks involves navigating through complex, high-dimensional parameter spaces, where the interplay between model complexity, dataset characteristics, and learning algorithms dictates the learning trajectory. This process is far from straightforward, often characterized by phenomena such as overfitting, under-fitting, and various forms of descent in performance metrics. The dynamics of training deep neural networks are critical for several reasons. Generalization is a primary concern in machine learning, focusing on the model's ability to generalize from training data to unseen data.


Multiple Descent: Design Your Own Generalization Curve

Neural Information Processing Systems

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, the locations of those peaks can be explicitly controlled. Our results highlight the fact that both the classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.


Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting

Karakida, Ryo, Akaho, Shotaro

arXiv.org Machine Learning

Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a solvable case of continual learning. We consider neural networks in the neural tangent kernel (NTK) regime that continually learn target functions from task to task, and investigate the generalization by using an established statistical mechanical analysis of kernel ridge-less regression. We first show characteristic transitions from positive to negative transfer. More similar targets above a specific critical value can achieve positive knowledge transfer for the subsequent task while catastrophic forgetting occurs even with very similar targets. Next, we investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task. We can guarantee that the generalization error monotonically decreases from task to task for equal sample sizes while unbalanced sample sizes deteriorate the generalization. We respectively refer to these improvement and deterioration as self-knowledge transfer and forgetting, and empirically confirm them in realistic training of deep neural networks as well.